Part of Speechtagger for Kannada
ثبت نشده
چکیده
Parts of speech tagging is a well-understood problem in NLP. The importance of the problem focuses from the fact that the Parts of Speech tagging is one of the first stages in the process performed by various natural language related process. POS tagging is the process of assigning the part of speech tag or other lexical class marker to each and every word in a sentence. POS tagging has a crucial role in different fields of NLP including MT. In linguistics, parts-of-speech tagging, also termed grammatical tagging or wordcategory disambiguation, is the process of marking up the words in a text or corpus as corresponding to a particular part of speech, based on both its definition, as well as its context. That is, relationship with adjacent and related words in a phrase, sentence, or a paragraph. In other words, it can also be defined as the process of labelling automatic annotation of syntactic categories for each word in a corpus. It is similar to the process of tokenization for computer languages.A part-of-speech is a grammatical category, commonly including verbs, nouns, adjectives, adverbs, determiner, and so on.
منابع مشابه
A Maximum Entropy Approach to Kannada Part Of Speech Tagging
Part Of Speech (POS) tagging is the most important pre-processing step in almost all Natural Language Processing (NLP) applications. It is defined as the process of classifying each word in a text with its appropriate part of speech. In this paper, the probabilistic classifier technique of Maximum Entropy model is experimented for the tagging of Kannada sentences. Kannada language is agglutinat...
متن کاملCross Language POS Taggers (and other Tools) for Indian Languages: An Experiment with Kannada using Telugu Resources
Indian languages are known to have a large speaker base, yet some of these languages have minimal or non-efficient linguistic resources. For example, Kannada is relatively resource-poor compared to Malayalam, Tamil and Telugu, which in-turn are relatively poor compared to Hindi. Many Indian language pairs exhibit high similarities in morphology and syntactic behaviour e.g. Kannada is highly sim...
متن کاملLanguage Identification of Kannada Language using N-Gram
Language identification is an important pre-processing step for any Natural Language Processing task. Kannada Language is an Indian Language and lot of research is being carried out on Kannada Language Processing. Major parts of online documents like websites are combination of Kannada and English Sentences. Language Identification is a preprocessing step for NLP tasks like POS tagging, Sentenc...
متن کاملMorpheme Segmentation for Kannada Standing on the Shoulder of Giants
This paper studies the applicability of a set of state-of-the-art unsupervised morphological segmentation algorithms for the problem of morpheme boundary detection in Kannada, a resource-poor language with highly inflectional and agglutinative morphology. The choice of the algorithms for the experiment is based in part on their performance with highly inflected languages such as Finnish and Ben...
متن کاملOCR for printed Kannada text to Machine editable format using Database approach
This paper describes an Optical Character Recognition (OCR) system for printed text documents in Kannada, a South Indian language. The proposed OCR system for the recognition of printed Kannada text, which can handle all types of Kannada characters. The system first extracts image of Kannada scripts, then from the image to line segmentation then segments the words into sub-character level piece...
متن کامل